Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 23, 2025

This PR provides a comprehensive analysis and implementation plan for incremental builds in Blake, addressing the current inefficiency where blake bake regenerates the entire site on every run regardless of changes.

Problem Analysis

Currently, Blake processes all markdown files and regenerates all output files on every build, even when no source files have changed. This results in:

  • Unnecessary processing time for large sites
  • Complete regeneration of all .razor files and content index
  • No optimization for CI/CD pipelines or development workflows

Evaluation of Options

After evaluating all proposed approaches:

  1. Hash-based content tracking - Accurate but still requires reading all files
  2. Timestamp-based tracking - Fast but unreliable across systems
  3. In-memory file watcher - Perfect for development but limited to blake serve
  4. Build cache manifest - ✅ Recommended approach

Recommended Solution: Build Cache Manifest

The analysis recommends implementing a build cache manifest using a .blake-cache.json file that tracks:

  • Content hashes of markdown and template files
  • Last modified timestamps and build metadata
  • Configuration changes that affect output
  • Template-to-content mappings for intelligent invalidation

Key Benefits

  • Universal: Works for development, CI/CD, and local builds
  • Comprehensive: Tracks all build inputs (markdown, templates, configuration)
  • Reliable: Not dependent on file system timestamps
  • Standard Practice: Similar to package-lock.json in other build systems
  • Performance: Expected 80-95% build time reduction for typical change scenarios

Implementation Plan

The analysis provides a detailed 5-phase implementation plan with 15 sub-issues:

Phase 1: Core Infrastructure

  • Design IBuildCache interface and data structures
  • Implement SHA256-based file content hashing
  • Create JSON persistence with atomic operations and corruption recovery

Phase 2: SiteGenerator Integration

  • Modify SiteGenerator.BuildAsync for cache-aware processing
  • Add template change detection and cache invalidation
  • Intelligently regenerate GeneratedContentIndex.cs only when needed

Phase 3: CLI Enhancements

  • Add --force and --clean-cache command-line options
  • Enhance logging to show cache performance and diagnostics

Phase 4: Advanced Features

  • Integrate file watcher with cache for blake serve
  • Add configuration change detection
  • Enable plugin cache participation

Phase 5: Testing & Documentation

  • Comprehensive unit and integration tests
  • Performance benchmarks and documentation updates

Files Added

  • docs/incremental-builds-analysis.md - Detailed evaluation of all options with pros/cons
  • docs/incremental-builds-backlog.md - Complete implementation backlog with acceptance criteria for 15 sub-issues
  • docs/incremental-builds-summary.md - Executive summary for creating GitHub issues
  • docs/build-cache-implementation.md - Technical implementation strategy with code examples

Expected Impact

  • Small changes: 80-95% reduction in build time
  • No changes: 90-99% reduction in build time
  • Template changes: Only rebuild affected content
  • Large sites: Significant improvement for sites with hundreds of pages

This analysis provides everything needed to begin implementation, including detailed acceptance criteria, code structure examples, and risk mitigation strategies. The approach follows Blake's philosophy of convention-over-configuration while providing substantial performance improvements.

Addressing #41.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] ✨ Feature: Incremental builds ✨ Feature: Incremental builds analysis and implementation plan Aug 23, 2025
Copilot AI requested a review from matt-goldman August 23, 2025 06:21
@github-actions github-actions bot added the preview Triggers a release of preview version of Nuget packages label Aug 23, 2025
@matt-goldman
Copy link
Owner

Having read through this and taken some time to consider it, I'm abandoning this pull request. It's a clever solution, and that's the problem. It's not aligned with Blake's core philosphy. I'm taking inspiration from Python's ethos. Blake favours simplicity, transparency, predictability.

Bringing this in introduces too many variables. Does everyone get this? Would it be enabled by default and disabled on demand? Or vice versa? It also undermines the principle of the markdown content as source of truth. I can see how this may be valuable in cases where someone may have a docs repo with thousands or tens of thousands of files; but this is not currently a requirement.

The goal here is to support hot reload at dev time, and I think a simpler approach is with FileSystemWatcher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Triggers a release of preview version of Nuget packages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants